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Abstract 

According to the media, in spring of this year the experiment CDF at Fermilab 
had made most likely ("this result has a 99.7 percent chance of being correct" [T]) a 
great discovery ("the most significant in physics in half a century" [5] ) . However, since 
the very beginning, practically all particle physics experts did not believe that was the 
case. This is the last of a quite long series of fake claims based on trivial mistakes in 
the probabilistic reasoning that can be sketched with the following statements, under- 
standable by everybody: the probability of a senator to be a woman is not the same as 
the probability of a woman to be a senator; a free neutron has only 3 x 10"'' probabil- 
ity to decay after two hours, but, if we observe a neutron decaying after such a time, 
this is not an indication of an anomalous behavior of such a particle; the fact that the 
probability of a Gaussian random generator with fi — and cr = 1 to produce a number, 
rounded to three decimal digits, equal to 3.000 is 4.2 x 10^® does not allow us to say 
that, once this number has been observed, there is only 4.2 x 10~^ probability it comes 
from that generator, neither that 4.2 x 10~^ is the probability that 3.000 is a statistical 
fluctuation] and not even, still considering the latter numerical example, we can say 
that the probability of 3.000 to be a statistical fluctuation is 1.3 x 10~^, 'because' this 
is the probability of such a generator to produce a number larger or equal than the 
observed one. The main purpose of this note is to invite everybody, but especially 
journalists and general public, most times innocent victims of misinformation of this 
kind, to mistrust claims not explicitly reported in terms of how much we should believe 
something, under well stated conditions and assumptions. (A last minute appendix has 
been added, with comments on the recent news concerning the Higgs at LHC.) 



*Note based on lectures at the University of Perugia, 15-16 April 2011 and at MAPSES School in Lecce, 
23-25 November 2011 
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1 Introduction 



The title of this paper is a paraphrase of that of an article ( "Probably guilty: Bad mathe- 
matics means rough justice"), appeared in New Scientist in October 2009 [3J, whose incipit 
induced me to write a 'defense of Columbo'0 that finally turned into a sui generis intro- 
duction to probabilistic reasoning f^. 

Indeed, as I wrote in |5], "I can give firm witness that scientific practice is plenty of 
mistakes of the kind reported [in the cited New Scientist article], that happen even in 
fields the general public would hardly suspect, like frontier physics, whose protagonists are 
supposed to have a skill in mathematics superior to police officers and lawyers." In fact, 
although it might sound amazing, the 'bad mathematics' that induces a judge to form a 
wrong opinion about somebody's guilt or innocence is the same that, for example, in April 
of this year made media and general public believe that particle physicists strongly believed 
(although not yet certain) Fermilab had made "the most significant discovery in physics 
in half a century" [2] - how else should normal people interpret a statement such as "this 
result has a 99.7 percent chance of being correct" [1]? 

Although I am quite used to fake claims of this kind, some of which are discussed in [6], 
and I have no interest in analyzing each individual case, the reason I want to return here 
to this subject, focusing on the Fermilab case, is twofold. First, due to the modern fast 
communication on the internet, I was involved in discussions about the issue 'discovery or 
not?' and realized that even people with university background in mathematics or physics 
were induced to think that it was most likely a discovery, or at least that Fermilab physicists 
were convinced this was the case. Second, a few days after the CDF announcement, I had 
to lecture PhD students in Perugia [7] and therefore I amused myself to collect related news 
and comments on the internet, because I expected that claim would have been a hot topic, 
as it turned out to be the case. Finally very recent interactions with students and young 
researchers in Lecce [8] convinced me to resume the paper draft started on the train Rome 
to Perugia. 

2 The facts 

On the 4th of April this year a paper appeared in the arXiv reporting about the ^^Invariant 
Mass Distribution of Jet Pairs Produced in Association with a W boson in pp Collisions at 
y/s = 1.96 TeV" [9j and the result was officially presented two days later in a 'special joint 
experimental-theoretical physics seminar' at Fermilab [lOj . In the meanwhile, on the 5th 
of April the article "^i Particle Lab, a Tantalizing Glimpse Has Physicists Holding Their 
Breaths'"^ appeared on The New York Times [2]. The following days the news spread all 
around the world (you can amuse yourself enquiring Google with the languages you know). 
Let us sketch how that happened. 

^Besides the inappropriate reference to the Columbo's episode, I consider that article substantially well 
done and I recommend its reading. To those interested in the subject "probability and the law" I also 
recommend, as starting points for further navigation, refs. _4^. 
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Figure 1: CDF data [9] before and after 'arithmetic' background subtraction 
2.1 The (filtered and processed) data 

Figured] reports the upper plots of figure 1 of the cited CDF paper. The left side one shows 
the histogram of the 'data' p the jet-jet mass distribution in 8 GeV bins, for a total of about 
10800 eventtH (points with vertical bars). The colored regions show the predictions split 
into several contributions, the most important of which is due to the production of two W 
bosons, or of a W boson together with a Z boson (red). We see that at around 140 GeV 
there are more events than 'expected' (an expression which we shall return to later). 

The right side plot shows the data after the contributions called here 'background' (all 
but the red one of the left plot) were subtracted 'arithmetically' 13 In the five bins between 
120 and 160 GeV there are about 230 events (but in the side bins there are even 'negative 
events' whose meaning is only mathematical). That was 'the excess'. 

^For non experts it is important to clarify, although this is not deeply relevant here, that the histogram's 
'data' are non simple 'empirical observations', but a result of selections and analysis (including calibrations), 
after suitable definitions of physical objects, like what a 'jet' is. 

^This number, as well as 230 that follows, was estimated from the figure - precise numbers are irrelevant 
for the purpose of this paper. 

''it seems rather natural to think that, if the purpose of a 'subtraction' would be that of highlight extra 
physical components in the spectrum, this procedure should not be simply an 'arithmetic subtraction' and, 
in particular, it should not yield unphysical negative counts. 
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2.2 The statistically motivated claim 

A customary way to quantify the difference between an observed spectrum and the expected 
one is the famous statisticll The CDF paper reports a "x^ per degree of freedom" {x^ /v) 
of 77.1/84 for the entire spectrum and 26.1/20 for the region 120-160 GeV. In both cases 
statistical practice based on this test states that "there is nothing to be surprised" . 

I know by experience that, when a test does not say what practitioners would like, 
other tests are tried - like when one goes around looking for someone that finally says one 
IS rig Indeed, in the statistics practice there is much freedom and arbitrariness about 
which test to use and how to use it. This is because hypothesis tests of the so called classical 
statistics do not follow strictly from probability theory, but are just a collections of ad hoc 
prescriptions. For this reason I do not want to enter on what CDF finally quotes as p-value 
(with the only comment that it does not even seem a usual p-value). Let us then just stick 
to the paper, reporting here the claim, followed by a reminder about what a statistician 
would understand by that name: 

• "we obtain a p-value of 7.6 x 10^"^, corresponding to a significance of 3.2 standard 
deviations" ; [9] 

• "the p-value is the probability of obtaining a test statistic at least as extreme as the 
one that was actually observed, assuming that the null hypothesis is true." [H] [The 
null hypothesis {Hq) is in this case "only standard physics, without contributions from 
new phenomena" .] 

2.3 How the claim was explained to the general public (and perhaps even 
what some particle physicist thought) 

But now we come to the clue point of this paper, since the CDF report [QJ is by itself not so 
'dangerous'. People do not know about p-values and, as a matter of fact, even those who 
calculate them for scientific purposes seem to by highly confused about their meaning |llj, 
as we shall see later. Normal people only understand what is the chance that a team has 
made a discovery instead of having just observed a statistical fluctuation. Or, at least, how 
much experts believe that the bump is hint of new physics, instead than a fluke. 

Let us go straight to read how the thing was reported in some online resources (boldface 
is mine). 

^Let us remind that if a variable is described by a distribution with v degrees of freedom, our (proba- 
bihstic) expectation ('expected value') is ly, with expectation uncertainty ('standard deviation') \/^. Hence 
if 6\ and 02 are variables of that kind, with vi = 84 and ^2 — 20, our expectations will be "84 ± 13" and 
"20 ±6", respectively. (As a side remark, we notice that, that adding a Gaussian component to explain the 
'excess', the difference between expected and observed value of the test statistic increases, since the goes 
to 56.7 for the entire region and 10.9 for the 'peak region'.) 

^After years of practice in particle physics and related subjects, I have developed my rule of the thumb, 
which until now has never failed: "the funnier is the name of the test used to show that there is a disagreement 
with the 'Standard Model' (or whatever is considered firmly established), the less I believe that this is the 
case" (with the corollary that "in the future I will tend to mistrust those people" ) . 
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• The New York Times, April 5 [2]: 

"Physicists at the Fermi National Accelerator Laboratory are planning to 
announce Wednesday that they have found a suspicious bump in their data 
that could be evidence of a new elementary particle or even, some say, a 
new force of nature. 

The experimenters estimate that there is a less than a quarter of 1 
percent chance their bump is a statistical fluctuation" 

• Fermilab Today, April 7 [12j 

"Wednesday afternoon, the CDF collaboration announced that it has ev- 
idence of a peak in a specific sample of its data. The peak is an excess 
of particle collision events that produce a W boson accompanied by two 
hadronic jets. This peak showed up in a mass region where we did not 
expect one. 

The significance of this excess was determined to be 3.2 sigma, after ac- 
counting for the effect of systematic uncertainties. This means that there 
is less than a 1 in 1375 chance that the eff^ect is mimicked by a 
statistical fluctuation." 

• Discovery News, April 7 [T] 

"If you're a little hazy about the details of Wednesday's buzz surrounding 
the potential discovery of "new physics" in Fermilab's Tevatron particle 
accelerator, don't worry, you're not alone. This is a big week for particle 
physicists, and even they will be having many sleepless nights over the 
coming months trying to grasp what it all means. 

That's what happens when physicists come forward, with observational 
evidence, of what they believe represents something we've never seen before. 
Even bigger than that: something we never even expected to see. 

It is what is known as a "three-sigma event," and this refers to the sta- 
tistical certainty of a given result. In this case, this result has a 99.7 
percent chance of being correct (and a 0.3 percent chance of being 
wrong)." 

• Jon Butterworth's blob on the Guardian |13j 

"The last and greatest breakthrough from a fantastic machine, or a false 
alarm on the frontiers of physics? 
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If the histograms and data are exactly right, the paper quotes a one-in- 
ten-thousand (0.0001) chance that this bump is a fluke." 

Let us make the logical complements of the highlighted statements (with the exception of 
the Discovery News one, that already provided the complementary propositions): 

• "there is more than 99 percent chance their bump is not a statistical fluc- 
tuation" ; 

• "there is more than 99.93% chance that the efl'ect is not mimicked by a 
statistical fluctuation" ; 

• "the paper quotes a 99.99% chance that this bump is not a fluke" , 

that can be summarized saying that 'we' should be highly confident this is a genuine 
discovery . 

3 Where is the problem? 

The question is very simple. No matter which test statistic has been used, there is no 
simple logical relation between a p-value and the probability of the hypothesis to test ('-ffo' 
— in this case "iifo = No New Physics" ) . 

Indeed, p- values are notoriously misunderstood, as well explained in a section of Wikipedia 
that I report here verbatim for the convenience of the reader |irj , highlighting the sentences 
that mostly concern our discourse. 

1. "The p- value is not the probability that the null hypothesis is 
true. In fact, frequentist statistics does not, and cannot, attach probabili- 
ties to hypotheses. Comparison of Bayesian and classical approaches shows 
that a p-value can be very close to zero while the posterior probability of 
the null is very close to unity (if there is no alternative hypothesis with a 
large enough a priori probability and which would explain the results more 
easily). This is the Jeffreys-Lindley paradox. 

2. The p-value is not the probability that a finding is "merely a 
fiuke." As the calculation of a p-value is based on the assumption that a 
Ending is the product of chance alone, it patently cannot also be used to 
gauge the probability of that assumption being true. This is different from 
the real meaning which is that the p-value is the chance of obtaining such 
results if the null hypothesis is true. 

3. The p-value is not the probability of falsely rejecting the null hypothesis. 
This error is a version of the so-called prosecutor's fallacy. 

4. The p-value is not the probability that a replicating experiment would not 
yield the same conclusion. 



6 



5. (l — p-value) is not the prohahility of the alternative hypothesis being true. 

6. The signiGcance level of the test is not determined by the p-value. The 
significance level of a test is a value that should be decided upon by the 
agent interpreting the data before the data are viewed, and is compared 
against the p-value or any other statistic calculated after the test has been 
performed. (However, reporting a p-value is more useful than simply saying 
that the results were or were not signiGcant at a given level, and allows the 
reader to decide for himself whether to consider the results significant.) 

7. The p-value does not indicate the size or importance of the observed effect 
(compare with effect size). The two do vary together however ~ the larger 
the effect, the smaller sample size will be required to get a signihcant p- 
value. " 

Are you still sure you had really understood what p-values mean? 

4 Why there is such a problem? 

Said in short, the reason of confusion is a mismatch between natural way of thinking and 
what we have learned in statistics coursesJil 

4.1 "The essential problem of the experimental method" 

Human minds reason very naturally in terms of how believable (or 'likely', or 'probable') 
are different hypotheses in the light of everything we know about them (see e.g. |15] ) 
and the mathematical theory of how beliefs are updated by new pieces of information 
was basically developed in a monumental work of Laplace exactly two hundreds years 
ago p!6| ITT] , although nowadays this way of reasoning goes under the name Bayesian. This 
approach considers valid sentences such as "probability that the CDF bump is a fluke", 
"probability that the Higgs boson mass is below 130 GeV,"|l and similar, all expressions 
that refer to "a problem in the probability of causes, [. . . ] the essential problem of the 
experimental method" [19] : from the observed effects we try to rank in probability the 
alternative causes that might have produced them. 

4.2 A curious ideology 

Now the problem arises because of a curious ideology of statistic thinking frequentism'') 
that forbids to speak of probability of causes. It is not a matter of a different way of making 
the calculations, but an ideological refuse to calculate them! Nevertheless - and this is the 

^Having written quite a lot on the subject, I don't want to go through yet another introduction to the 
subject and refer to the 'Columbo paper' ^ (someone might find useful also [14]), only reminding here some 
of the basic ideas. 

*By the way it was about 36% percent by the best or our knowledge at the beginning of 1999 [18], and 
it has changed with time, especially during 2011! 



7 



worst! - most people, even if they think to adhere to the frequentistic approach ("probabiUty 
is the long run limit of relative frequency", and so on), are not even aware that, according 
to this unfortunately still dominant school, they should not be allowed to speak about 
probability of true values, probability of causes, and so on. As a matter of fact, when I try 
to tell it in seminars, people usually stare at me as I had just landed from a far planet. 

4.3 The mismatch 

As a consequence, the results of frequentistic methods are usually interpreted as if they 
were probabilities of hypotheses, also because the names attached to them induce to think 
so, because they do not correspond to what they really are. More or less like the misusing 
of names, adjectives and expressions common in advertisements. It follows that some 
results of frequentistic prescriptions are called confidence interval, confidence level or 95% 
upper/lower C.L., although they are definitely not intended to mean how much we should 
be confident on something^ If you consider yourself a frequentist, but you find strange 
what you are reading here, trust at least Neyman's recommendations: 

"Carry out your experiment, calculate the confidence interval, and state that c 
belong to this interval. If you are asked whether you 'believe' that c belongs to 
the confidence interval you must refuse to answer. In the long run your asser- 
tions, if independent of each other, will be right in approximately a proportion 
a of cases." (J. Neyman, 1941, cited in Ref. p2] ) 

Clearly, this is not what a scientist (as well as everybody else) wants. Otherwise, if one 
is just happy to make statements that are e.g. 95% of times correct, there is no need 
to waste time and money making experiments: just state 95% of times something that it 
is practically certainly true and the remaining 5% something that is practically certainly 
falseS 

Put in other terms, if what you want is a quantitative assessment of how much you have 
to be confident on something, on the basis of the information available to you, then use a 
framework of reasoning that deals with probabilities. The fact that probabilities might be 
be difficult to be precisely assessed in quantitative terms does not justify the fact that you 
calculate something else and then use it as if it were a probability. For example, on the basis 
of the evaluated probability you might want to take decisions, that is essentially making bets 
of several kinds, that for example might be, sticking to particle physics activity: how much 

^For example, when I ask about the meaning of 95% CL lower bound on Higgs mass from LEP direct 
search, practically everybody - and I speak of particle physicists! - 'explains' the result in probabilistic 
terms iSO,, although it is well known to frequentistic experts that "The lower bounds on the Higgs 
mass that are quoted for the direct Higgs searches at LEP say absolutely nothing about the 
probability of the Higgs mass being higher or lower than some value." 21 (By the way, it seems 
that the method described in ^1] is essentially the one on which the LHC collaborations have agreed to 
report search limits: at least you know now what these results (do not) mean!) 

^"if you want to try, you can play with The ultimate confidence intervals calculators^ and no strict 
follower of Neyman's teaching can blame you of the results, that asymptotically will 'cover' the true value 
of whatever quantity you have in mind in exactly the proportion of times you pre-define. 
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emphasis you want to give to a 'bump' (just send a student to show it in a conference, 
pubhsh a paper, or even make press releases and organize a 'cerimonius' seminar with 
prominent people sitting in the first rows); or if it is worth continuing an experiment; if 
it is better to build another one; or perhaps to invest in new technologies; or even to 
plan a future accelerator; and so on. In all cases, rational decisions require to balance the 
utilities resulting from different scenarios, weighted by how probable you consider them. 
Using p-values, or something similar, as if they were probabilities can lead to very bad 
mistakes 

^^For example, in 2000 there was some excitement at CERN because some LEP experiments were observing 
some events above the expectation and there was a big action against the CERN directorate, that had decided 
to stop LEP in order to use structures and human/financial resources for LHC. This was an email I received 
the 10th of November 2000, addressed to a short list of physicists: 

Subject: Do you want the Higgs found next year? 

As you may know CERN DG, L.Maiani, has decided to shut off LEP. The decision is to be 
confirmed at a CERN Committee of Council meeting on Friday 17th. 

As you probably know there is evidence for a Standard Model Higgs boson seen in the data 
in the last few months, with a probability as a background fluctuation of 4 per mille, or 2.9 
sigma. 

In other words we are seeing exactly what we should expect if Mh—115. 
[If you wonder why 2.9a's is 4 per mille, instead of 2, don't ask me.] 

The message ended with a request to write to Maiani in support of extending LEP run. Here follows my 
instant reply: 

Let me understand: 

do you REALLY feel 99.6% sure that the Higgs is around 115GeV (let's say below the effective 
kinematical threshold at the present LEP energy)? If not, how much are you confident? 

Running or not running is a delicate decision problem which involves beliefs and risks (both 
financial and sociological). Therefore, I cannot disagree much with Maiani, being in his posi- 
tion. 

On the other hand, in the position of any LEP collaborator I would push to run, certainly! 
(Given the same beliefs, the risk analysis is completely different). 

Being myself neither the CERN Director-General, or a LEP physicist, but, with this respect, 
just a physics educated tax payer, I Gnd myself more on the side of Maiani than on that of 
our LEP colleagues. 

To make it clear, the "99.6%" could not be how much we had to rationally believe the Higgs was at 115 GeV, 
because it was a 0.004 p- value incorrectly turned into probability. Estimating correctly the probability, one 
would have got a few percent (see e.g. [15] for the method, although the numbers had changed in the 
meanwhile). And with a few percent, it would have been crazy to continue the LEP run, delay LHC and so 
on. On the other hand, if there was really a 99.6% probability, then LEP had to go on. (As it often happens 
with misinterpreted frequentistic methods, the errors are not little, like getting 99.6 for what should have 
better been 99.1, 98.5, or even perhaps 97%! - see chapter 1 of for other examples. Here one considered 
practically certain something that was instead almost impossible.) 
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5 The mathematics of beUefs 



Among the web resources mentioned above, I find particularly enlighting Jon Butterworth's 
blob on the Guardian [T3] . Let us go back to the expression he used to explain the statistical 
meaning of the result, and compare it with the last paragraph of the article, split here into 
three pieces (1-3): 

(0) "the paper quotes a one-in-ten-thousand (0.0001) chance that this bump is a fluke." 

(1) "My money is on the false alarm at the moment,. . ." 

(2) ". . . but I would be very happy to lose it." 

(3) "And I reserve the right to change my mind rapidly as more data come in!" 

We have already seen that proposition (0) is just a misleading misinterpretation of p- values, 
about which there is little to discuss. Instead, the last paragraph is a masterpiece of correct 
good reasoning (I would almost say Good's reasoning [2^ ) . that deserves some comments. 

5.1 Stating the strength of "pragmatic behefs" by odds 

From proposition (1) we finally understand very well Butterworth's beliefs, in spite of the 
contradiction with (0). In fact, since ancient times betting has been recognized to be the 
best way to check how much one really believes something, as well stated by Kant when he 
talks about pragmatic beliefs: [23] 

"The usual touchstone, whether that which someone asserts is merely his per- 
suasion - or at least his subjective conviction, that is, his firm belief - is betting. 
It often happens that someone propounds his views with such positive and un- 
compromising assurance that he seems to have entirely set aside all thought 
of possible error. A bet disconcerts him. Sometimes it turns out that he has 
a conviction which can be estimated at a value of one ducat, but not of ten. 
For he is very willing to venture one ducat, but when it is a question of ten he 
becomes aware, as he had not previously been, that it may very well be that he 
is in error." 

And, in fact, in the mathematical theory of probability of Laplace all probabilistic state- 
ments can be mapped into betting statements, like his famous one concerning his evaluation 
of the uncertainty on the value of the mass of Saturn: |17j 

"To give some applications of this method I have just availed myself of the 
opus magnus that Mr. Bouvard has just finished on the motions of Jupiter and 
Saturn, of which he has given very precise tables. . . . His calculations give him 
the mass of Saturn as 3,512th part of that of the sun. Applying my probabilistic 
formulae to these observations, I End that the odds are 11,000 to 1 that the error 
in this result is not a hundredth of its value. " 
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That is 

P(3477 < M Sun/ M Sat < 3547 | /(Laplace)) = 99.99% , 

where /(Laplace) stands for all information available to Laplace (probabilistic statements 
are always conditioned by a state of information). The Laplace's result is a very clear 
statement and there is a perfect match between beliefs, odds and probabilistic statement. 
Instead, I ensure you, a "95% C.L. lower limit" result cannot be turned into a 19:1 bet that 
the quantity in object is above that limit (see footnote [9|) , neither a p- value of e.g 10~^ can 
be turned into a 10000:1 bet in favor of a discovery (see also the last minute reference }30j.) 



5.2 Coherent virtual bets 

A few comments on the way Laplace reported his result in terms of betting odds are in 
order. 

• First, he does not say that he would be ready to make a 11,000 to 1 in favor of the 
result, but rather that "i/ze odds are 11,000 to f\ This implies that "11,000 to 1 
in favor" and "1 to 11,000 against" are both fair bets. This is essentially the idea 
behind the so called de Finetti's coherent bet^5\: in order to express your degree of 
belief in favor of something, you fix the odds and leave somebody else to choose in 
which direction to bet. This is the best way to force people to assess what they really 
believe, no matter what the event is and how the probability has been evaluated (at 
limit, just by intuitive reasonings, if no other means are available - why not? what 
is important is that once you fix the odds you have no sensitive preference towards 
either direction.). 

• Second, what is the sense of a bet whose result would have not probably been solved 
in Laplace's lifetime? This is another important ingredient: the fact that bets have 
to be considered hypothetical virtual'). It is just a way to assess probability!^ 



5.3 Belief Vs imagination, beliefs Vs wish, subjective Vs arbitrary: the 
role of the coherent (virtual) bet 

The role of the bet, although virtual, in the sense of 'as I would be called to bet', is crucial 
to make clear distinctions between different concepts that could otherwise be confused. 

• We can imagine something, just combining ideas (even "the New Jerusalem, whose 
pavement is gold and walls are rubies" - on this issue a reference to Hume is a 
must [T5] ■ T. 1.1. 1.4), but, nevertheless, we could not believe it. 

^^But if you really have the chance of making real bets, don't use the fair odds: you want to maximize the 
expected gain! This is what insurance companies and professional bookmakers do: evaluate the fair odds 
and then propose the most unfair ones in a given direction, unbalanced as much as someone can still accept 
them. 
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• We should also be careful not to confuse what we wish with what we do belief. I 
would like to win the highest prize playing at a lottery, but I don't believe I will. 
Similarly - and this is well stated in proposition (2) - I think everyone working in 
frontier science would be very happy if something really new 'appears', such that it 
forces us to change our vision of the world. But before we can accept something 
like that we really need much experimental evidence, obtained in different ways with 
different techniques. 

• Finally, it is a matter of fact that 

"Since the knowledge may be different witli different persons or witli tlie 
same person at different times, tfiey may anticipate the same event with 
more or less conEdence, and thus different numerical probabilities may be 
attached to the same event." [26] 

It follows that probability is always conditional probability, as again well stated by 
Schrodinger [26] , 

"TLus whenever we speak loosely of 'the probability of an event,' it is 
always to be understood: probability with regard to a certain given state 
of knowledge, " 

i.e. P{E) has always be understood as P{E\I), where / stands for a given status 
of information, that changes with persons (subjects) and time. Hence a probability 
assessment has always to be meant as 

p[E\m]. 

This is the meaning of the adjective subjective attached to probability, that has nothing 
to do with arbitrary. Once again, thinking in terms of bets, instead of noble but empty 
ideals of 'objectivity' that can easily drift to 'metaphysics', helps to distinguish what 
is really arbitrary from sound rational beliefs. 

To conclude this subsection, when somebody claims something on the basis of arguments 
that you do not clearly understand, follow Kant's suggestion and ask him/her to bet for 
money. And, if it is a claim in favor of new/extraordinary physics only based on a p- value, 
don't hesitate to cash, as nicely shown in the comic of figure [2][27], appeared immedi- 
ately after the recent (in?-)famous result on superluminar neutrinos [28|. (But, besides the 
humorous side, I invite my colleagues to reflect on the fact the general public is not by 
definition stupid and there is an increasing number of well educated tax payers who are 
starting to get tired of fake claims.) 

5.4 Updating beliefs 

Let us come finally to proposition (3): rational people are ready to change their opinion in 
front of 'enough' experimental evidence. What is enough? It is quite well understood that 
it all depends on 
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Figure 2: A comic from xkcd,[27\ on superluminar neutrino, valid for any fancy claim. 



• how the new thing differs from from our initial behefs; 

• how strong our initial beliefs are. 

This is the reason why practically nobody took very seriously the CDF claim (not even most 
members of the collaboration, and I know several of them), while practically everybody is 
now convinced that the Higgs boson has been finally caught at CERN^T\ - no matter if 
the so called 'statistical significance' is more ore less the same in both cases (which was, 
by the way, more or less the same for the excitement at CERN described in footnotellll- 
nevertheless, the degree of belief of a Higgs boson found at CERN is substantially different!). 

Probability theory teaches us how to update the degrees of belief on the different causes 
that might be responsible of an 'event' (read 'experimental data'), as simply explained by 
Laplace in his Philosophical essay [T7] ('VI principleEl at pag. 17 of the original book, 



^■^In the Essai 'principles' do no stand for what we mean now as 'first principles', or 'axioms', but are 
rather the fundamental rules of probability that Laplace had derived elsewhere. 
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available at book.google.com - boldface is mine): 



"The greater the probability of an observed event given any one of a number 
of causes to which that event may be attributed, the greater the hkehhooc^^ 
of that cause {given that event}. The probabihty of the existence of any one 
of these causes {g'iveii the event} is thus a fraction whose numerator is the 
probabihty of the event given the cause, and whose denominator is the sum 
of similar probabilities, summed over all causes. If the various causes are not 
equally probable a priory, it is necessary, instead of the probability of the event 
given each cause, to use the product of this probability and the possibility of 
the cause itself. This is the fundamental principle of that branch of the 
analysis of chance that consists of reasoning a posteriori from events 
to causes." 

This is the famous Bayes' theorem (although Bayes did not really derive this formula, but 
only developed a similar inferential reasoning for the parameter of Bernoulli trial£l) that 
we rewrite in mathematical terms [omitting the subjective 'background condition' Is{t) that 
should appear - and be the same! - in all probabilities of the same equation] as 



E,P(E\Cj)-P(Cj) ■ 

This formula teaches us that what matters is not (only) how much E is probable in the 
light of Ci (unless it is impossible, in which case it is ruled out - it is falsified to use a 
Popperian expression), but rather 

• how much P{E \ Ci) compares with P{E \ Cj), where Ci and Cj are two distinguished 
causes that could be responsible of the same effect; 

• how much P{Ci) compares to P{Cj). 

^■'Note that here likehhood is the same as probabihty, and has nothing to do with what statisticians call 
'likelihood' - reading directly the original French version might help, also taking into account that two 
hundred years ago the nouns were not as specialized as they now are. 

^^In modern terms, the problem solved by Bayes in a quite convoluted notation [29] was the inference of 
the binomial parameter p, conditioned on x successes in n trials, under the assumption that all values of p 
were a priori equally likely 

rf I N f{x\n,p) 
f{p\n,x) = 



Jo f{x\n,p)dp 

Laplace solved independently this problem and, indeed, the formula that gives the expected value of p, i.e. 

li P ~ , 

is known as Laplace 's rule of succession. 
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The essence of the Laplace(-Bayes) rule can be emphasized writing the above formula for 
any couple of causes Ei and Ej as 

PjCj I E) ^ P{E I Cj) P{a) _ 
P{Cj\E) P{E\Cj) ^ P{Cj) ■ 

the odds are updated by the observed effect by a factor {^Bayes factor^) given by the 
ratio of the probabilities of the two causes to produce that effect. 
In particular, we learn that: 

• It makes no sense to speak about how the probability of Cj changes if: 

— there is no alternative cause Cj] 

— the way how Cj might produce E has not been modelled, i.e. if P{E \ Cj) has 
not been somehow assessed. 

• The updating depends only on the Bayes factor, a function of the probability of E 
given either hypotheses, and not on the probability of other events that have not been 
observed and that are even less probable than E (upon which p-values are instead 
calculated) . 

• One should be careful not to confuse P{Ci \ E) with P[E \ Ci), and in general, P(A \ B) 
with P{B I A). Or, moving to continuous variables, /(^ | x) with f{x \ /i), where '/()' 
stands, depending on the contest, for a probability function or for a probability density 
function, while x and ^ stand for an observed quantity and a true value, respectively. 

In particular the latter points looks rather trivial, as it can be seen from the 'senator Vs 
woman' example of the abstract. But already the Gaussian generator example there might 
confuse somebody, while the '/x Vs x' example is a typical source of misunderstandings, 
also because in the statistical jargon f{x \ fi) is called 'likelihood' function of fi, and many 
practitioners think it describes the probabilistic assessment concerning the possible values 
of fi (again misuse of words! - for further comments see Appendix H of [5]). 

6 Conclusions 

Fake claims of discoveries are mainly caused by statistical prescriptions that do not follow 
probabilistic reasoning, meant as mathematics of beliefs, as it was conceived as a whole by 
Laplace and that nowadays is known under the appellative 'Bayesian'. As a consequence 

• the concept of probability of causes is refused; 

• the role of Bayes' theorem to update beliefs is rejected, and hence 

— the role of prior knowledge is not explicitly recognized; 
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— the myth has been created that a single hypothesis can be 'tested' without taking 
exphcitly into account alternative(s); 

• the intuitive concept of 'probabihties of causes' has been surrogated by ad hoc hy- 
pothesis test prescriptions, 

— whose choice and use are rather arbitrary; 

— whose results are routinely misinterpreted. 

Unfortunately, this wobbly construction faces against the human predisposition to think 
naturally in terms of degrees of belief about anything we are in condition of uncertainty, 
including the several causes that might have produced the observed effects. The result of 
this mismatch is that 

• probabilities of the effects given the causes are confused with the probabilities of the 
causes given the effects; 

• even worse, p- values are used as if they were the probability that the hypothesis under 
test is true . 

In addition, the pretension that 'priors are not scientific and should not enter the game' 
("the data should speak by themselves") avoids that sound scientific priors mitigate the 
deleterious effects of misunderstood p-values. 

But, fortunately, being the natural intuition of physicists rather 'Bayesian' [20J, after all 
it is more a question of rough scientific communication than of rough science. In fact, even 
the initial excitement of someone who takes a bit too seriously claims that the rest of the 
physics community classifies immediately as 'fake' - priors! - is harmless, if the discussions 
remain in the community. And the debates are often even profitable, because they offer 
an opportunity to check how new possible phenomena and new explanations could fit into 
the present network of beliefs based on all previous experimental observations. This is for 
example what has recently happened with the exchange of ideas that has followed the Opera 
result on neutrino speed, from which most of us have learned something. 

As far as the communication of claims to non experts, that include also physicists of 
other branches, or even of a close sub-branch, my recommendation is of making use, at 
least qualitatively, of the Bayesian odd update, i.e. 

• state how much the experimental data push towards either possibility (that is the 
Bayes factor, which has nothing to do with p-values); 

• state also how believable are the two hypotheses independently of the data in object. 

I am pretty sure most people can make a good use of these pieces of information. Moreover, 
my recommendation to journalist and opinion makers (including bloggers and similar) is 
that, in the case of doubt: 
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• don't accept answers in terms of p-values, unless you are sure you understand them 
well and you feel capable to explain their correct meaning to the general public without 
they become somehow probabilities of the hypotheses to be compared (good luck!); 

• refuse as well 'confidence levels', '95% confidence exclusion curves' and similar; 

• ask straight the direct questions: 

— How probable it is? (Possibly informing - threatening! - him/her in advance 
that his/her answer will be reported as "Dr X.Y. considers it such and such 
percent probable" .) 

— How much do you believe? (Same as the previous one.) 

— How much would you believe in either hypothesis if you did not have this data? 
(The answer allows you to estimate the priors odds.) 

— How much would you believe in either hypothesis given these data, if you con- 
sidered the two hypotheses initially equally probable? (The answer allows you 
to evaluate the Bayes factor.) 

— How much would you bet in favor of your claim? (And if you realize there are 
the conditions described in section 15.31 and figure [21 don't miss the opportunity 
to gain some moneyf) 

To end, I would like to congratulate all people working at LHC on the amazing high quality 
work done in these years and on having been able to report these convincing hints on the 
Higgs boson in a record time (I had never betted in favor of this possibility in 2011 even 
six months ago! But now the real exciting bet is what next?). 
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Appendix: '???' at Fermilab Vs Higgs boson at CERN 



Since, just before I was going to post this paper there has been the joint ATLAS and CMS 
seminar on the Higgs boson search at CERN, followed by days of rumors, I cannot avoid to 
add here some last minute comments on these results, comparing them with the CDF case. 

The big difference between the Fermilab result discussed here and that of CERN |3T] is 
essentially a question of priors, whose role was discussed in section [5j If we observe some- 
thing unexpected, we need an overwhelming experimental evidence before we are convinced 
this is really a genuine discovery, which is not case of the highly expected Higgs at LHC. 
These are the arguments in favor of the fact that the elusive beast has been finally surrounded 
(every particle hunter sniffs it, although it will be considered to finally in 'our hands' only 
when we shall be able, with the increasing number of events, to study its behavior, such as 
decay modes etc.): 

• the so called Standard Model of particle physics provides an excellent description of 
a network of experimental facts, and such a particle is required to give a sense to the 
theory; 

• the indirect information on the Higgs boson ('radiative corrections') constrains its 
mass at the order of magnitude of 100 GeV (although with a large uncertainty - see 
|18) for a probability distribution, even though this has been slightly changing with 
time) ; 

• direct searches at LEP have pushed its mass with almost certainty above 114 GeVl^ 

• similarly, direct searches at the LHC and at the Tevatron have squeezed its mass 
value into a relative narrow window (I save the reader yet another disquisition on the 
meaning of those limits); 

• the CERN indication shows up 

— in the middle of the remaining window of possibility (and then not in contradic- 
tion with other experimental pieces of information); 

— with production rate in agreement with the theory and with many other experi- 
ments (from which the theoretical parameters have been inferred); 

— with decay modes also in substantial agreement with expectations; 

— in two detectors, although with some differences that can be considered physio- 
logical, taking into account of the difficulty of the search. 

^^As mentioned in footnote [51 the 95% CL bound has nothing to do with 95% probabihty that its value 
was above the bounds. Translating the experimental information from the direct search into probabilistic 
assessments is not that easy, because the number also depends on the upper limits. In particular, if there 
would be 'no' upper bound on the mass (that obviously cannot weigh grams!) there is no way to calculate 
the required probability. For further details see [18] and chapter 13 of [6]. 
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In addition, I would also like to remark that the presentations of the two team leaders have 
been rather prudential, as if, instead of the Higgs, it were just an unexpected bunch of 
extra events in the middle of nowhere. 
Some further remarks are in order. 

• The reason why practically every particle physicist is highly confident that the Higgs 
is in the region indicated by LHC has little to do with the number of sigma 's (I hope 
the reader understands now that the mythical value of 5 for a 'true discovery' is by 
itself pure nonsense, as it is clear from the comparison between '???' at the Tevatron 
and the Higgs boson at CERN in the only place it could be after it has been hunted 
unfruitfully elsewhere 0) 

• This number of sigma's cannot be turned in probabilistic statements (or odds!) about 
Higgs or not-Higgs, as we read again on The New York Times@ 

The Atlas result has a chance of less than one part in 5,000 of 
being due to a lucky background noise, which is impressive but far 
short of the standard for a "discovery," which requires one in 3.5 miUion 
odds of being a random Euctuation. [30] 
(Again misinterpreted p- values - basta!) 

• Instead, if we want to make quantitative probabilistic assessments, we need the like- 
lihoods (this time this noun has the technical meaning statisticians use), per each ex- 
periment and per each channel, instead of the frequentistic 95% CL exclusion curves, 
of dubious meaning and useless to be combined. A plea to the LHC collaborations is 
therefore in order: please publish likelihoods. 

• In the past days I have visited some internet resources to check the rumors. As a 
result 

— I have seen quite a lot 'creative thinking' concerning related statistics/probability 
matter (starting from the New York Time article cited above) and you can 
amuse yourself browsing the web. I just would like to suggest to Italian read- 
ers http://www.keplero.org/2011/12/higgs.htiiil where there are some at- 



tempts (in particular by nicola farina and Moping Owl) to clarify some prob- 
abilistic issues; 

in the name of many contributors to forums and blogs, I make special plea to 
my colleagues physicists and to journalists: 

please stop relating the Higgs boson to God! 



^^And if it wouldn't exist at all? OK, formulate the alternative model and try to assess your beliefs in 
the alternatives. 

definitely hope that when this influential newspaper reports on probability of important, uncertain 
scenarios that really matter for our lives, such as economy, health, international crises, future of the Planet 
and so on, its experts really know what they are talking about! 
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